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ABSTRACT 

In this paper, the detection of response patterns 
aberrant from the Rasch model is considered. For this purpose, a new 
person fit index, recently developed by I. W. Molenaar (1987) and an 
iterative estimation procedure are used in a simulation study of 
Rasch model data mixed with aOserrant data. Three kinds of aberrant 
response behavior are considered: (1) guessing to complete the test; 
(2) guessing in accordance with the three-parcuneter logistic model; 
and (3) responding with different abilities on different subsets of 
items. The pow^* in detecting such aberrants is evaluated in two 
cases: when item difficulties are known; and when item difficulties 
are estimated from the data, including aberrants. The results reveal 
thati* in the latter case, the estimates of the model parameters are 
biased and that the power of the index, as a consequence, is reduced. 
It is shown that, by using an iterative procedure, the recovery of 
the power of the index to the level obtained by known item 
difficulties is achieved. Furthermore, depending on the type of 
aberrance, a considerable reduction of the bias in the model 
parameters is possible. Finally, it is confirmed that this new index 
allows detection of aberrant response patterns with better 
statistical properties than former person fit indices. Three data 
tables and eight graphs are included. (Author/TJH) 
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Abstract 

In this paper, the detection of response patterns aberrant from the 
Rasch model Is considered. For this purpose^ a new person fit 
Index, recently developed by Molenaar, and an iterative estimation 
procedure are used in a simulation study of Rasch model data mixed 
with aberrant data. Three kinds of aberrant response behavior are 
considered: guessing to complete the test, guessing in accordance 
with the three -parameter logistic model, and responding with 
different abilities on different subse;;s of Items. The power in 
detecting such aberrants is evaluated in two cases: item 
difficulties known, and Item difficulties estimated from data 
including aberrants. The results reveal that in the latter case, 
the estimates of the model parameters are biased and the power of 
the index, as a consequence, is reduced. It Is shown that using an 
iterative procedure, the recovery of the prwer of the Index to the 
level obtained by known Item difficulties Is achieved. Furthermore, 
dependent on the type of aberrance, a considerable reduction of the 
bias in the model parameters is possible. Finally, it is confirmed 
that this new index allows us to detect aberrant response patterns 
with better statistical properties than former person fit indices. 
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Introduction 



In applications of the Rasch model (RH) It Is often assumed that 
the data ma^y contain response patterns from a minority of persons 
whose response behavior Is aberrant. In such the case aberrants' 
response patterns should be detected and then treated separately 
because mostly the estimates of the aberrants' true ability are not 
appropriate* or at least, lees reliable. Further, the removal of 
aberrants from the data m1g^t result In a reduction of the bias In 
the estimates of the model parameters. 

The decision which person is of which behaylor can only be taken 
on the base of his/her response pattern. In order to detect 
aberrant response patterns, various Indices have been proposed 
(e.g., Drasgow, Levine & Williams, 1985; Molenaar & Holjtink, 1987; 
Smith, 1985, 1986; Tatsuoka, 1984; Trabin & Weiss, 1983; Wright & 
Stone, 1979, chap. 4, chap. 7;). For a review of the indices and 
their application, see Kogut (1986). Usually, aberrant response 
patterns In the data are detected by comparing the value of a given 
person fit index with the 100*a-th percentile of Its distribution 
under the RH. In such decision processes, the probability of 
misclassifying a RH-behavIng person as aberrant will be equal 
to a. 

Until recently, the Indices in the RM were calculated 
conditionally on a fixed ability level. Now It is clear thr^t such 
conditioning forces one to compare a given response pattern with 
all possible patterns cf which many must have totally different 
ability estimates (Holjtink, 1986; Holenaar & Holjtink, 1987). In 
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contrast* conditioning on the total score In the RH limits the 
possible comparisons of a given pattern only to those for which 
this conditioning Is appropriate (all these patterns have the same 
ability estimate). In addition, when conditioning on a fixed 
ability level, the distribution of patterns depends on the ability 
level. When conditioning on a total score, however, the 
distribution of patterns is independent of the ability; thus, a 
given pattern can be handled Irrespective of the ability level It 
was obtained at. 

So f£r, decisions about the fit of a given pattern to the model 
were made under the too optimistic supposition that the index 
distribution could be approximated by a normal distribution. Now, 
It Is also evident that a more accurate approximation of the exact 
distribution of the Index is needed. To realize these two 
recommendations, a new index based on conditioning on the total 
scoire» together with a more accurate approximation of its 
distribution have been proposed (Holjtink, 1986; Molenaar & 
Holjtink, 1987). 

The object of this paper Is to evaluate the power of Molenaar*s 
new 1nd'}x in detecting persons of aberrant behavior. For this 
purpose, a simulation study was conducted. Three kinds of possibly 
aberrant response behaviors from the RM were considered: guessing 
to complete the test, guessing In accordance with the three- 
parameter logistic model, and responding with two different 
abilities on two different subsets of items. The detection of the 
aberrants was carried out in two different ways: 1) the generated 
Item difficulties of the RM were known, for Instance, from a 



Detecting Aberrants 
4 



calibration study, and 2) the item difficulties were estimated from 
data containing some aberrants. In the latter case, the evaluation 
of the power of the Molenaar's index was carried out with the help 
of an iterative estimation procedure proposed In Kogut (1986). 
Furthermore, a comparison of the power of Molenaar's and Levlne's 
index (Drasgow, Levlne & Williams, 1985) was made. 

Method 

In this study, person to the Is assessed using the 
previously mentioned Molenaar index (Holjtink, 1986; Holenaar & 
Holjtink, 1987), I.e., 

P(X I r) - n {exp(.D.)} , 
V i=l ^ 

k 

iiiere bi Is the difficulty of item 1, r = y X. Is the total score 

^ 1=1 ^ 

for a pattern X-(Xj,...,X^) on a test with k Items, and is the 

basic symmetric function of order r (Fischer, 1974). The value of 
the P(X|r) Inde. for a given pattern Is equal to the probability of 
the pattern in the RM given Its total score (Fischer, 1974). This 
implies that if for a given pattern, the value of the P(x|r) index 
is very low, then this pattern is very Improbable for a RM*behaving 
person. Therefore, if the index value is lower than a fixed percen- 
tile of the P(x|r) distribution for the RM, the proper decision is 
to consider the pattern as aberrant. 

The fixed percentile for the distribution of P(X|r) index is 
rather a complicate function of the item difficulties because of 
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the nature of the y^'s. If the item difficulties are known, the 
percentile can be calculated with the help of a complete 
enumeration of the (exact) distribute :)n of the index. As this 
enumeration requires (^) calculations for a given r, and 2^-2 
calculations for all r=l,....k-l. it can be used only for small 
values of k. If there are many items, it Is more convenient to use 
the 100*a-th percentile of an appropriate chi --square approximation 
to the index distribution (Molenaar & Hoijtink, 1987). Another 
possibility is to estimate the percentile from a sample 
distribution of RM patterns. 

If the item difficulties are unknown and have to be estimated 
from data including aberrant response patterns, tien a new 
impediment niay arise. Due to the presence of aberrants, the 
estimates of the Item difficulties usually are biased, and so are 
values of the P(Xlr) index and the 100*a-th percentile. To remove 
the bias, an iterative strategy is proposed where each Iteration 
consists of an approximation of the 100*a-th percentile, the 
decision which patterns are aberrant, and the actual removal of 
these patterns from the data, respectively. 

Generation of Data 

In order to evaluate the power of ^he P(X|r) Index In detecting 
persons of an aberrant behavior, a simulation study was conducted. 
Both RH data and data containing aberrant response patterns wv^re 
generated. 

The RM data were composed of ?500 response patterns generated 
according to tne RM for a test of 20 items. The item difficulties 
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and the distribution of ability were taken from cases already 
studied by Holjtink (1986). The Item difficulties were placed 
symmetrically around zero with more densitj' in the neigbourhood of 
zero (namely; +0.11, tO.32. iO.54. iO.77, il.02. +1.28, il.58. 
♦1.92, 12.35. +3.00). The distribution of ability was normal. 
N(0.0. 1.53). Such values for the item difficulties and this 
distribution of ability can be met when a test Is specially 
designed for the group of homogeneous persons. 

The aberrant response data consisted of 500 patterns generated 
to get a specific aberrance from the RM. Tnree kinds of possibly 
aberrant response behavior were considered: guessing to complete 
the test, guessing in accordance with the three-parameter logistic 
model, and responding with two different abilities by the same 
person on two different subsets of Items. 

Guessing to complete the test occurs when a person responds to 
somt itefT6 at random, whereas his/her responses to the rest of the 
items are according to the RM. This deviation from R'^ response 
behavior is often observable with persons of low ability on the 
most difficult items. However, In tests with a time limit there 
might be no connection with the person's ability and/or the item 
difficulties. For simulation purposes, two extreme subset of items 
-tne five most difficult and the five easiest- were selected from 
the item difficulties. Besides, in order to simulate aberrance of 
this kind, in particular for persons of low ability, three normal 
distributions of ability were selected: N(m, 1.53), where m^O.O, 
-1.0 and -2.0. On the selected subsets of items the responses were 
generated with a constant probability of a correct response equal 
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to 0.2, 0.25 and 0.5; In other words, guessing at random on 
multiple-choice Items with five, four and two alternatives was 
simulated. On the rest of the items, responses according to the RM 
were generated. So, In all, eighteen data sets were generated, 
where each data set Included 500 aberrant patterns of the type 
considered here. 

Next, guessing in accordance with the three*parameter logistic 
(3PL) model was simulated. In applications of IRT models to data 
from multiple-choice tests, the 3Pl model is supposed to handle 
guessing behavior on difficult items fcr persons of low ability 
more adequately. The only difference Detween the 3PL model and the 
RM is that the probability of the correct response for person v to 
Item 1. Py^. now is a function of three Item parameters (difficulty 
b^, discrimination a^, and pseudo-guessing parameter c^). More 
precisely, 

1-c. 

P„i(e) « c. + • . v=l,...,n; 1-1,..., k 

^ l+exp[-1.7a.(e^-b^)] 

where a.>0, D^eR and 0 <c^<l ere the parameters characterizing Item 
1 In the 3PL model (in the RM we have c^*0 and a^«l for all 
i«l,...,k). In order to simulate a more Involved type of guessing 
behavior, for all items the parameters of the 3PL model were 
intentionally fixed at the following values: all discrimination 
parameters were set equal to one, the difficulty parameters were 
set at the same level as In the RM, and the pseudo^-guesslng 
parameters ut one of the values 0.2, 0.25 and 0.5. The response 
patterns were generated again using the above three different 
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distributions of ability: N(m, 1.53) with m«0.0,-1.0 and -2.0. So, 
nine data sets were generated, each consisting of 500 response 
patterns according to the 3PL model. 

Finally, If a person responds with a varying ability to 
different items, his/her response behavior must be seen as aberrant 
from the RM as well. This aberrance Is of frequent occurence for 
persons with cultural and educational retardation or with certain 
misconceptions. This phenomenon may also occurs If someone copies 
from a neighbour, or if certain Item order by person Interactions 
arise (e.g., a slow starujp and s1e(^;pness). In this study, the case 
of a person displaying two distinct abilities on two different 
subsets of Items was also simulated. As In the case of guessing to 
complete, the subsets of Items were the five easiest and the five 
mosw difficult. The ability distribution of aberrants was the same 
as in the RH data, I.e., N(0.0, 1.53). On each of the subsets the 
abilities were lowered about 1 or 2 logits in comparison with the 
abilities on the rest of the Items. Nevertheless, on both subsets 
the responses were generated according to the RM. So, four data 
sets, each consisting of 500 aberrant patterns, were used. 

Approximation of the 100*tt«'th Percentile of the Index 

To approximate the 100*a-th percentile of P(X|r), RM patterns 
were sampled because of a large number (20) of items considered. 
The approximation was carried out using the following three steps: 
(1) RH patterns were sampled using estimates of the item 

difficulties and the person abilities to get at least 200 

patterns for each total score r, r=l 19; 
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(2) the index values for the RM patterns sampled in (1) were 
calculated Then, these values were collected and ordered per 
total score group (for each separate total score r, from the 
lowest, Vj, to the highest, Vf,^, index value); 

(3) the 100*a-th percentile for the distributions of the Index 
sampled In (2) were approximated by value of V^^^^ for 
each' total score r. 

The Power of the Index in Detecting Aberrants 

For each generated data set, the value of the PvX|r) index was 
compared with the estimates of the 5%-t^i percentile. If the Index 
value was lower than the estimate of this percentile, the obicr*ved 
pattern was classified as aberrant. This implies that the 
percentages of the RM-generated patterns misclassif ied as aberrant 
were expected lo bt about 5% , both for each total score separately 
and across total scores. 

a) Item Difficulties Known from a Calibration Stu<<y 

In applicatiors of the RM we can possibly deal with cases where 
the Item diff^cult^e. are known (for instance, from a careful 
calibration study). Accordingly, such a case was considered in this 
paper. To evaluate t^e power of the P(x|r) Index In such the case, 
estimates of the Item difficulties and the abilities were 
calculated from the Rh^ data only by the conditional maximum 
likelihood method as imolemented in the PML algorithm (Gustafsson, 
1981). Further, the three steps to get an approximation of the 
5%-th percentile of the index were carried out. Finally, the values 
of the P(X|r) index for the RM and the aberrant patterns were 
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calculated, and the decisions about aberranre were made with the 
help of the approximate percentiles. 

b) Item Difficulties Estimated from Data Including Aberrants 

On the other hand, in applications of the RM we have to deal 
with cases where the item difficulties are unknown and have to be 
estimated from data containing aberrant response patterns. 
Therefore In this study the item difficulties and abilities were 
also estlmatec from RM data mixed with data from aberrants. Having 
the^f" estimates, the power of the index was evaluated as in the 
previous case. However, here the whole procedure was carried out 
Iteratlvely. This means that after the detection of a»>errant 
patterns, they were removed from the data and the procedure was 
repeated until new aberrant patterns were no longer found. 

In order to compare the power of the P(X|r) index (conditioning 
on the total score) with the power of the former Indices 
conditional on a fixed ability level, the ZLq Index was used 
(Drasgow, Levlne & Williams, 1985): 

L„-E ( i,|le) 

where 

Lq » log P(x|e) - iog{^n^ P^.(e) 1 ^ 

dnd L(LQ|e), Var(LQ|6) conditional expected value and 

variance of Lq, respectively. Note that the ZLq Index is tne 
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standardized version of Lq. the origin of P(Xlr)* Working with the 
ZLq Index, the standard normal approximation was used if a decision 
About aberrance had to be made. When the obtained value of ZLq 
Index was smaller than -1.96, the observed pattern was classified 
as aberrant. 

Results 

Guessing to Complete the Test 

The r 'Its for the mean power of the P(X|r) index in detecting 
aberrance, are for the case of known Item difficulties, presented 
in Table 1, Colomn 3. These results are indicative of the 
percentages of aberrants detected correctly with the P{X|r) Index 
over the three groups of aberrants with a normal distribution of 
ability. As is clear from Table 1, the mean power of the P(X|r) 
index depends to a high degree on the probability of guessing, on 
the mean ability of the aberrants, and on the Item difficulties of 
the guessed items. 



Insert Table 1 about here 



In the case of the five most difficult items, Me mean power to 
detect guessing to complete increases with the probability of 
guessing and decreases with the mean ability of the guessers. In 
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contrast, for the five easiest Items, the mean pof^er decreases with 
the probability of guessing and increases with the mean ability of 
the guessers. 

Actually, such results are to oe expected if we notice the 
Inconsistency between the aberrant and RH patterns. For instance, 
guessing at random on five items with the probability of guessing 
equal to 0.2 results in 0, 1 or 2 correct responses with the 
cumulative binomial probability aoout 0.94. The same cumulative 
probability with the guessing constant 0.5 will be obtained for 1, 
2, 3 and 4 correct responses. For guessers, these predictions are 
Independent of the ability and the item difficulties of the Items 
the person guesses on. However, for a RM-behaving person, the 
probability of the correct responses depends to a high degree on 
his/her ability and on the item difficulties. To clear this 
question up, let us consider the item characteristic curves (ICC*s) 
of the RM at a fixed ability level. In the case of five difficult 
items it holds that the lower the ability of a RM-behaving person, 
the higher the probability of incorrect responses on all of these 
items. Therefore, the inconsistency between possibly random and RM 
responses will increase with the probability of guessing but 
decrease with the ability. In turn, with increasing inconsistency, 
the power to detect aberrance will rise. This dependency, i.e, the 
power as a function of ability for the probability of guessing 
equal to 0.25, is ilustrated in Figure 3 (see below). In the case 
of five easy items, even a RM-behaving person of relatively low 
ability should response correctly on some of these items. 
Therefore, the inconsistency and the power will decrease with the 
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Detecting Aberrants 
13 



. ERIC 



probability of guessing but Increase with the ability of person. 
For this case and for the probability of guessing equal to 0.25» 
the power as a function of ability Is ilustrated in Figure 1. If 
such a function Is known, conclusions about the mean power over the 
group (as the values presented in Table 1) can be drawn. 

The conparlson between the mean power of the P(X|r) and ZLq 
indices (Colomns 3 and 4 of Table 1, respectively) clearly 
Indicates the superiority of the P{Xtr) index. Irrespective of the 
conditions under which the power Is evaluated. This confirms that 
conditioning on the total score and using the given approximation 
of the I00*a-th percentile of the index results in more effective 
detection o^ aberrants. In addition, when using the P(X|r) Index, 
unlike ZLq, it is also possible to obtain a constant Type I error. 
Namely, for the P(X|r) index, the type I error was 4,88% over the 
group of all RH patterns and about 5%, with larger random 
deviations, for each total score group separately. For ZLq, this 
error was 2.20% on the average but 0.0% for the extreme and about 
3.5% for the middle total score group. Thus, the ZLq index is more 
conservative In detecting aberrants. 

The power of the P(X|r) Index for the case of the item 
difficulties estimated from data containing aberrants * thus when 
the iterative procedure was used - is given In Figure 1. Here the 
ability distribution of aberrants. was the same as for the RH data, 
i.e., normal N(0.0, 1.53). Besides, aberrants responded at random 
with the probability of guesssing equal to 0.25 on the subset of 
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the five easiest Items. It should be reminded that in the case of 
knOMn Item difficulties, the mean poner was 72.0% (see Table 1) and 
the power was a rapidly increasing function of ability (see Figure 
1, Graph #)• 



Insert Figure 1 about here 



Note that at a broad range of high abilities the power to detect 
this aberrance was nearly 100% . When using estimates of the item 
difficulties based on the data containing all aberrants the power 
is significantly lower. Nevertheless, after the first two 
iterations a considerable Improvement of the detection of aberrants 
was obtained. Note also that the power converged almost 
monotonically to the power for the case of known item difficulties; 
however, the type I error after the third iteration was a little 
larger (5.94%) than for the case of known item difficulties 
(4.88%). 

The bias in the estimates of the item difficulties due to the 
presence of about 20% aberrants is, for the four subsequent 
iterations, given in Figure 2. At the first Iteration, 
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thus before any aberrants were removed* the PHL estimates of the 
Item difficulties had been systematically biased. From Figure 2 the 
large overestlraation of difficulties of the items the aberrants 
guessed on and the underestimation of the higher Item difficulties 
Is Obvious. When the item difficulty approached the extremes of the 
difficulty scale, over- and underestimation increased (to about 13 
and about 3 times their standard errors, respectively). Using the 
iterative procedure, a large part of the bias was reduced. A very 
significant reduction of bias in the estimates of item difficulties 
was obtained after the first two iterations. Although subsequent 
Iterations still reduced the bias for the guessed items, at the 
same time, they Introduced an another bias in the opposite 
direction (to be seen at the right extreme of the difficulty 
scale). This new bias is due to the removal of the RM patterns that 
were misclassified as aberrant (Type I error). Besides, it might be 
due to the aberrant patterns still remaining in the data because of 
a too small inconsistency to be detected by the P(X|r) Index (lack 
of power of the index). 

On the other hand, the presence of about 20% aberrants affects 
the estimates of the abilities in a similar way. The low abilities 
were overestimated whereas the high abilities were underestimated. 
Here, however, the bias changed monotcnically and was much lower 
than the standard errors of the ability estimates (to about 39% and 
about 221 of the standard errors at the extremes of the ability 
scale). Using the iterative procedure a reduction of the bias in 
the estimates of abilities was possible as well. The maximal 
reduction of bias was obtained after two Iterations. In the 
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subsequent Iterations the bias increased* 

Let us now consider the event of aberrants of low ability 
guessing at random on the five most difficult items • Here the 
ability distribution of aberrants was normal N(*2.0, 1.53)» and as 
In the previous case, the probability of guesssing was equal to 
0,25. It should be reminded that in the case of known item 
difficulties, the mean power for the P(X|r) index was 37.6% (see 
Table 1), and the power was a decreasing function of ability (see 
Figure 3, Graph •). From Figure 3 it is also evident 



Insert Figure 3 about here 



that detecting this kind of aberrance with 100% certainly was not 
possible, even not for persons of a very low ability. On the other 
hand, using estimates of the item difficulties obtained from the 
data containing all aberrants did not reduce the power of the index 
significantly. These results are thus in contrast with those for 
the five easiest items. However, the former level of the power from 
the case of known item difficulties was almost reached when the 
Iterative procedure was used, after the second iff ation. 

The bias in the estimates of the item difficulties in question 
was in general smaller (see Figure 4). 
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Insert Figure 4 about here 



Almost complete reduction of the bias on the guessed Items was 
observed after the second Iteration. Also, the optimal reduction of 
bias over all Items was obtained after two Iterations. The next 
Iterations Introduced another bias of the opposite direction, in 
particular for the Items with difficulties at the left extreme of 
the scale (as can be seen In Figure 4). 

Guessing In Accordance with the 3PL Model 

For the case of known Item difficulties, the mean power of 
the P(X|r) Index to detect guessing In accordance with the 3PL 
model is shown in Table 2. It can be seen that the mean power 



Insert Table 2 about here 



of the P(X|r) Index increased with the pseudo-guess ing parameter 
and decreased with the mean ability. These results can be expected 
If we notice the Inconsistency between patterns obtained In the 3PL 
model and In the RM. For this purpose, consider the ICC*s for the 
3PL and the RM for the ability level fixed et e. The differences 
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between these ICC's increase with the difficulty of the items, 
particularly for items with a difficulty on the right side 
of 0. For these items, a 3PL pattern tends to contain more correct 
responses than a RM pattern. If a person's ability is much below 
the difficulty of the easiest Item, the 3PL pattern still might 
contain a certain number of correct responses placed at random over 
the items. So, detecting this aberrance for a person of very low 
ability will be easy, whereas for a person of very high ability it 
m^y be almost impossible (see Figure 5 for this dependency). 
Obviously, with increasing pseudo-guessing parameter values, the 
differences between the 3PL and RM patterns tend to be larger; 
hence» the detection of this aberrance should be improved. 

The power of the P()(|r) index for item difficulties estimated 
from data including aberrant patterns is shown in Figure 5. Here 
the Iterative procedure was applied as well. The ability 
distribution of the aberrants was N(-2.0, 1.53) and the pseudo- 
guessing parameters were set equal to 0.25 (see Table 2 for the 
mean power in this case). As it can be seen from Figure 5, a 
considerable improvement of the power in detecting aberrants was 



Insert Figure 5 about here 



obtained after the first iteration only. These results correspond 
to the case of known item difficulties. The next iterations showed 
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a little increase In power but at the cost of Increasing the Type I 
error (about 5.98% after the third Iteration). 

The bias In the estimates of the Item difficulties due to the 
presence of the aberrants Is shown In Figure 6. 



Insert Figure 6 about here 



The presence of 500 aberrant patterns resulted In overestlmation of 
the easy item difficulties and underestimation of the difficult 
items. This bias changed uniformly and was maximal at the extremes 
of the difficulty scale (about 5 times the standard errors). After 
the first Iteration, most items had difficulty estimates within 
their standard errors. Subsequent Iterations Introduced a bias in 
the opposite direction; however, this was only observable for a few 
items of extreme difficulty. 

Responding with Two Different Abilities on Different Subsets of 
Items 

The mean power of the P(X|r) index to detect aberrance of this 
kind, for the case of known item d1f1cult1es» Is given in Table 3. 



Insert Table 3 about here 
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Notlcf^ that in this case, for a mean ability of sue&^ers equal to 
0.0, the P(Xir) index had much less power than in the case of 
guessing to conplete the test, inplying that this type of aberrance 
Is more difficult to detect by the P(X|r) Index. In order to 
Axplain this, let us consider the ICC's In the RM. In particular 
for these items on which the aberrance occurred. For the case of 
five easy Items, if ah aberrant*s ability (i.e., the one on the 
rest of items) corresponds to the difficulty of the items in 
question, then the larger the difference between the two abilities, 
the larger the modifications in the response pattern compared with 
the expected RM pattern. This is why such an aberrant person can be 
detected. If an aberrant 's ability differs mich from the 
difficulties of the five easy items, then less modifications In the 
pattern can be expected; thus, such a person will be more difficult 
to detect (this dependency can be seen in Figure 7; however, for 
very high abilities only). Now. let us consider the subset of the 
five difficult Items. If an aberrant 's ability is below the 
difficulties of these items, then the response pattern should 
contain fewer inconsistencies. Therefore, the percentages of 
detected aberrants will be below supposed 5% (see Table 3). 

The power ^of the P(X|r) index for item difficulties estimated 
from data Including aberrant patterns is shown in Figure 7. Here 
the ability distribution of aberrants was N(0.0. 1.53) and the 
aberrants had a ability lowered by 2.0 logits on the five easiest 
Items compared to the ability on the rest of the items. As it is 
shown in Figure 7. 
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Insert Figure 7 about here 



a small but uniform Improvement of the power to detect aberrants 
correctly can be observed for the first two Iterations. Here It 
also seems that the power Is converging monotonlcally to the one 
for the case of known Item difficulties. 

The bias in the estimates of the Item difficulties due to the 
presence of the aberrants Is shown In Figure 8. 



Insert Figure 8 about here 



Again over- and underestimation of the Item difficulties were 
observed at the first Iteration, 1.e» when all aberrant patterns 
were present in the data. The overestimatlon occured on the Items 
on which there was aber**ance, and underestimation on the rest of 
items. It Is remarkable that on both subsets, the bias on the items 
was more uniform than the one In the case of guessing to complete. 
When using the iterative procedure the optimal reduction of bias 
seemed to be obtained after the second or third iteration, but this 
reduction was not fully satisfactory. 

It can be expected that other indices, specially developed for 
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the aim to detect this type of aberrance (e.g.» the unweighted 
between fit index by Smith, 1985) will be more efficient. The cost 
of this will then, of course, be a reduced power to detect other 
types of aberrance (Smith, 1985,1986)* 

Discussion 

The results of this simulation study show that i appplicatlons 
of the Rasch model to test data the P(X|r) Index is more successful 
In detecting aberrant reponse patterns than the formerly more 
popular ZLq Index. Namely, more power in the detection and a 
preassigneo Type I error is obtained* This confirms the expected 
advantages with respect t) conditioning on the total score (instead 
of on a fixed ability) 9i.d the use of a more accurate approximation 
to the exact distribution of the index (instead of a normal 
approximation). 

Further, the dependency of the power of the index on ability has 
been seen to vary according to the kind of aberrance. For guessing 
on easy items to complete the test, the power is a rapidly 
Increasing function of ability, but for guessing on difficult items 
as well as for guessing In accordance with the 3PL model It Is a 
decreasing function. However, for responding with two different 
abilities by the same person on ttt^ different subsets of Items no 
clear dependency could be observed. 

If item difficulties are unknown and have to be estimated from 
data containing aberrant response patterns, then. In general, the 
power of the P(X|r) index Is reduced. The results show that using 
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the Iterative procedure, the previous level of the power is 
recove-^d within a few 'teratlons. This Is, however, at the cost of 
a small Increment in the Type I error. For these reasons, the use 
of the Iterative proce^ '^e can De recommended If the detection of 
aberrants Is a problem of Interest. In such cases only a few 
iterations should be used. 

Finally, due to the presence of aberrant patterns In the data, 
estimates of the Item difficulties might be very biased dependent 
on the <1nd of aberrance. If the aberrance occurs on the whole 
test, the use of two or three Iterations of the procedure generally 
results In a satisfactory reduction of the bias. If the aberrance 
occurs on a few Items only, the result? are questionable and not so 
satisfactory because of remaining bias. This final bias Is due to 
the removal of the RM patterns misclasslfled as aberrant. Besides, 
It might be also due to the presence of aberrant patterns left In 
the data that could rot be detected by the Index. So, If there Is 
no other way to estimate the Item difficulties, the use of the 
Iterative procedure with only a few Iterations Is recommended. 
However, one should take Into account that some bias may remain In 
the estimates. 
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Author's Note 



The author Is indebted to Ron J.H. Engelen and Wlm J. van der 
Linden for their comments on an earlier version of the paper. The 
content of the paper is however fully his own responsibility. 



ERIC 



30 



Detecting Aberrants 
27 



Table 1 

Mean power of P(X|r) and ZLq In detecting 
guessing to complete the test 



Mean Ability Probability Power (In %) 
of Guossers* of Guessing P(X|r) ZLq 



On Five Most Difficult Items 



0.0 


0.20 


13.6 






0.25 


16.8 


11.6 




0.50 


42.0 


33.4 


-1.0 


0.20 


21.0 


12.6 




0.25 


25.6 


15.8 




0.50 


55.6 


47.2 


-2.0 


0.20 


31.6 


18.8 




0.25 


37.6 


23.6 




0.50 


69.2 


60.8 




On Five Most Easy Items 




0.0 


0.20 


76.2 


63.9 




0.25 


72.0 


60.8 




0.50 


48.0 


41.2 


-1.0 


0.20 


57.6 


42.6 




0.25 


52.6 


39.2 




0.50 


30.8 


23.0 


-2.0 


0.20 


38.2 


22.8 




0.25 


33.2 


21.0 




0.50 


17.2 


11.4 



Ability of guessers is N{m, 1.53), where m»0.0, 
-1.0 and -2.0. 
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Table 2 

Mean power of P(X|r) In detecting 
guessing In accordance with 3PL model 



Mean Ability Pseudo-guessing Power of P(x|r) 
of Guessers* In 3PL (In %) 



0.0 0.20 17.9 

0.25 22.8 

0.50 32.6 

-1.0 0.20 27.9 

0.25 31.3 

0.50 45.5 

-2.0 0.20 39.0 

0.25 45.6 

0.50 62.5 



* Ability of guessers Is N(m, 1.53), where m=0.0, 
-1.0 and -2.0. 
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Table 3 

Mean power of P(X|r) In detecting 
two different abilities on different subsets of items 



Difference In Abilities* Power of P{xir) 

(in X) 



On Five Most Difficult Items 

-1.0 1.8 
-2.0 1.0 



On Five Host Easy Items 

-1.0 13.8 
-2.0 33.0 



♦Ability on the five items subset Is lower than 
ability on the rest of items; (ability of guessers 
is N(0.0, 1.53) ). 
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Figure 1. Power of P(X|r) in detecting guessing to complete on 
the five most easy Items (probability of guessing 
Is 0.25, ability of guessers Is N(0.0, 1.53)). 
Note. Difficulties of guessed Items are marked with t . 
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O.S 



•B 0.0 



-0.5 



-1.0 





? ♦ ♦ t ^ 

-3 -2 -1 0 

Difficulty 



Figure 2. Bias In estimates of item difficulties by guessing to 
complete on the five most easy Items (probability of 
guessing Is 0.^*^, ability of guessers is N(0.0, 1.53)). 
(See Figure 1 for explanation of symbols). 
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Figure 3. Power of P{Xir) in detecting guessing to complete on the 
five most difficult items (probability of guessing is 
0.25, ability of guessers is H(-2.0, 1.53)). 
(See Figure 1 for explanation of symbols). 
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-l-O"' . . . . . s . 

♦ ♦ ♦ ♦ ♦ 

-3-2-10123 

Difficulty 

Figure 4. Bias in estimates of item difficulties by guessing to 

complete on the five most difficult items (probability of 
guessing Is 0.25, ability of guessers is N(-2.0, 1.53)). 
(See Figure 1 for explanation of symbols). 
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Figure 5. Power of P(X|r) in detecting guessing in accordance with 
the 3PL model (pseudo-guessing parameters are 0.25, 
ability of guessers is N(-2.0, 1.53)). 
(See Figure 1 for explanation of symbols). 



ERIC 



38 



Detecting Aberrants 
35 




4 

-3 



-2 



♦ f ♦ 4 ♦ 4 ♦ 

0 

Difficulty 



4 4 



2 



4 

3 



re 6. Bias In estimates of Item difficulties by guessing In 

accordance with the 3PL model (pseudo-guessing parameters 
are 0.25, ability of guessers Is M(-2.0, 1.53)). 
(See Figure 1 for explanation of symbols). 
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Figure 7. Power of P{X|r) In detecting two different abilities on 
two different subsets of Items (ability on the five most 
easy Items is 2.0 lower, ability of aberrants is 
N(0.0, 1.53)). 

(See Figure 1 for explanation of symbols). 
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•Figure 8. Bias in estimates of Item difficulties by two different 
abilities on different subsets of items (ability on the 
five most easy items is 2.0 lower, ability of aberrants 
Is N(0.0. 1.53)). 

(See Figure 1 for explanation of symbols). 
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